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Abstract. Probabilistic coupling is a powerful tool for analyzing pairs of 
probabilistic processes. Roughly, coupling two processes requires finding 
an appropriate witness process that models both processes in the same 
probability space. Couplings are powerful tools proving properties about 
the relation between two processes, include reasoning about convergence 
of distributions and stochastic dominance —a probabilistic version of a 
monotonicity property. 

While the mathematical definition of coupling looks rather complex 
and cumbersome to manipulate, we show that the relational program 
logic pRHL—the logic underlying the EasyCrypt cryptographic proof 
assistant—already internalizes a generalization of probabilistic coupling. 

With this insight, constructing couplings is no harder than constructing 
logical proofs. We demonstrate how to express and verily classic examples 
of couplings in pRHL, and we mechanically verify several couplings in 
EasyCrypt. 

1 Introduction 

Probabilistic couplings [9, 7, 10] are a powerful mathematical tool for reasoning 
about pairs of probabilistic processes: streams of values that evolve randomly 
according to some rule. While the two processes may be difficult to analyze 
independently, a probabilistic coupling arranges processes {iti}, {vi} in the same 
space—typically, by viewing the pair of processes as randomly evolving pairs of 
values {(rtj,!;^)}, coordinating the samples so that each pair of values are related. 
In this way, couplings can reason about the relation between the two processes. 

From the point of view of program verification, a coupling is a relational 
program property, since it describes the relation between two programs (perhaps 
one program run on two different inputs, or two completely different programs). 
However, couplings are particularly interesting for several reasons. 

Useful consequences. Gouplings imply many other relational properties, and are 
a powerful tool in mathematical proofs. 

A classic use of coupling is showing that the distribution of the value of 
two random processes started in different locations eventually converges to the 
same distribution if we run the processes long enough. This property is a kind of 
memorylessness —or Markovian —property: The long-term behavior of the process 


is independent of its starting point. To prove memorylessness, the typical strategy 
is to couple the two processes so that their values move closer together; once the 
values meet, the two processes move together, yielding the same distribution. 

A different use of couplings is showing that one (numeric-valued) process 
is, in some sense, bigger than the other. This statement has to be interpreted 
carefully—since both processes evolve independently, we can’t guarantee that 
one process is always larger than the other on all traces. Stochastic domination 
turns out to be the right definition: for any k, we require Pr [u > k] > Pr [u > k]. 
This property follows if we can demonstrate a coupling of a particular form. 

Relational from non-relational. Often, the behavior of the second coupled process 
is completely specified by the behavior of the first; for instance, the second process 
may mirror the first process. In such cases, the coupling allows us to reason just 
about the first process. In other words, a coupling allows us to prove certain 
relational properties by proving properties of a single program. 

Compositional proofs. Typically, couplings are proved by coordinating corre¬ 
sponding samples of the two processes, step by step; paper proofs call this process 
“building a coupling”, reflecting the piecewise construction of the coupled distribu¬ 
tion. As a result, couplings can be proved locally by considering small pieces of the 
programs in isolation, enabling convenient mechanical verification of couplings. 


Contributions 

In this paper, we apply relational program verification to probabilistic couplings. 
While the mathematical definition of coupling is seemingly far from program 
verification technology, our primary insight is that the logic pRHL from Barthe, 
Gregoire, and Zanella-Beguelin [1] already internalizes coupling in disguise. More 
precisely, pRHL is built around a lifting construction, which turns a relation R 
on two sets A and B into a relation i?l over the set of sub-distributions over A 
and the set of sub-distributions over B. Two programs are related by Kf precisely 
when there exists a coupling of their output sub-distributions whose support only 
contains pairs of values (u, v) which satisfy R. 

This observation has three immediate consequences. First, by selecting the 
relation R appropriately, we can express a wide variety of coupling properties, 
like distribution equivalence and stochastic domination. Second, by utilizing the 
proof system of pRHL, we can constructing and manipulate couplings while 
abstracting away the mathematical details. Finally, we can leverage EasyCrypt, 
a proof assistant implementing pRHL, to mechanically verify couplings. 


2 Preliminaries 

Probabilistic coupling. We begin by giving an overview of probabilistic coupling. 
As we described before, a coupling places two probabilistic processes (viewed as 
probability distributions) in the same probabilistic space. 


We will work with sub-distributions over discrete (finite or countable) sets. 
A sub-distribution fj, over a discrete set A is a function A —>■ [ 0 , 1 ] such that 
J 2 aeA support supp(/r) is the pre-image of ( 0 , 1 ]. We let Distr(A) 

denote the set of sub-distributions over A. Every sub-distribution can be given a 
monadic structure; the unit operator maps every element a in the underlying set 
to its Dirac distribution Sa and the monadic composition Mlet(^,F) G Distr(i3) 
of ^ G Distr(A) and F : A ^ Distr(i3) is Mlet(/r, F)(6) = ^ 

When working with sub-distributions over tuples, the probabilistic versions 
of the usual projections on tuples are called marginals. The first and second 
marginals tti(/ r) and 7r2(/r) of a distribution g, over AxB are defined by tti (/i)(a) = 
'^ 2 ilJ-){b) = J2aGA ^)’ formally define coupling. 

Definition 1. The Frechet class of two sub-distributions g,i and /i 2 

over A and B respectively is the set of sub-distributions p, over AxB such that 
7 ri(/i) = Pi and 'n^ip) = P2- Two sub-distributions PitP 2 ore said to be coupled 
with witness p if p G d{pi,P2)! *-e- h is in the Frechet class of pi,p2. 

Lifting relations. Before introducing pRHL, we describe the lifting construction. 
This operation allows pRHL to make statements about pairs of (sub-)distributions, 
and is a generalized form of probabilistic coupling. 

The idea is to define a family of couplings based on the support of the witness 
distribution. Given a relation R C A x B and two distributions pi and p2 over 
A and B respectively, we let £ii{pi,p2) denote the subset of sub-distributions 
p G ^{pi,P2) such that supp(/i) C R. Given a ground relation R, we view 
distributions in Zr as witnesses for a lifted relation on distributions. 

Definition 2. The lifting of a relation R C A x B is the relation R^ C 
Distr(A) X Distr(i3) with pi R^ p2 iff Zji{pi, P2) 0 . 

Before turning to the definition of pRHL, we give some intuition for why 
lifting is useful. Roughly, if we know two distributions are related by a lifted 
relation Rf , we can treat two samples from the distribution as if they were related 
by R. In other words, the lifting machinery gives a powerful way to translate 
between information about distributions and information about samples. Deng 
and Du [6] provide an excellent introductory exposition to lifting, and give several 
equivalent characterizations of lifting. 


2.1 A pRHL primer 

We are now ready to present pRHL, a relational program logic for probabilistic 
computations. In its original form [1], implemented in the EasyGrypt proof 
assistant [4], pRHL reasons about programs written in an imperative language 
extended with random assignments with the following syntax of commands: 

c ::= X e \ X A- d\\^ e then c else c | while e do c | skip | c; c 
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Sample 


/eTi 


Ta 


N 2:1 ■#- di 


*2 


Vi; G Ti. di{v) = d2(/ v) 
da : V-u, d>[v/®i, /(u)/x2] => $ 


'I' 

If — 


ei = 62 1 = Cl ~ 62 : A 61 ^ 1 = Ci ~ 62 : A “161 => 

N if 61 then ci else Ci ~ if 62 then ca else 62 : ’I' => 


While 


<? => 61 = 62 1 = Cl ~ 62 : *5 A 61 

1 = while 61 do 61 ~ while 62 do ca : ^ A ^ei 


Fig. 1: Two-sided proof rules (selection) 


where e ranges over expressions, d ranges over distribution expressions, and 
X d stores a sample from d into x. Commands are interpreted as functions from 
memories to distributions over memories; using the fixed point theorem for Banach 
spaces, one can define for each command c a function |c] : Mem —Distr(Mem), 
where Mem is the set of well-typed maps from program variables to values. 

Assertions in the language are first-order formulae over generalized expressions. 
The latter are built from tagged variables xi and X2, which correspond to 
the interpretation of the program variable x in the first and second memories. 
Assertions in pRHL are deterministic and do not refer to probabilities. 

Definition 3. A pRHL judgment is a quadruple of the form 1 = ci ^ C2 : 
where d' and d> are assertions, and ci and C 2 are separable statements, i.e. they 
do not have any variable in common. A judgment is valid iff for all memories 
mi and m2, we have {mi,m2) \= ^ ([ciI(toi), |c2](to2)) \= . 

Judgments can be proved valid with a variety of rules. 

Two-sided and one-sided rules. The pRHL logic features two-sided rules (Figure 1) 
and one-sided rules (Figure 2). Roughly speaking, two-sided rules relate two 
commands with the same structure and control flow, while one-sided rules relate 
two commands with possibly different structure or control flow; the latter rules 
allow pRHL to express asynchronous couplings between programs that may 
exhibit different control flow. 

We point out two rules that will be especially important for our purposes. 
The rule [Sample] is used for relating two sampling commands. Note that it 
requires an injective function / : Ti —> T 2 from the domain of the first sampling 
command to the domain of the second sampling command. When the two sampling 
commands have the same domain—as will be the case in our examples—/ is 
simply a bijection on T = Ti = T 2 . This bijection gives us the freedom to specify 
the relation between the two samples when we couple the samples. 

The rule [While] is the standard while rule adapted to pRHL. Note that we 
require the guard of the two commands to be equal—so in particular the two 
loops must make the same number of iterations—and d> plays the role of the 
while loop invariant as usual. 





SampleL 


1= skip ~ c : Vd, 9\v/xi] => ^ 


IfL 

WhileL 


l=ci~c:!^'Aei=>^> l=Ci~c:^A -■ei => $ 
1= if ei then ci else c'l c : 'F => $ 

N Cl ~ skip : <P A ei ^ $ while ei do ci lossless 
1= while Cl do Cl ~ skip : $ ^ $ 

Fig. 2: One-sided proof rules (selection) 


Structural and program transformation rules. pRHL also features structural rules 
that are very similar to those of Hoare logic, including the rule of consequence and 
the case rule. In addition, it features a rule for program transformations, based 
on an equivalence relation ~ that provides a sound approximation of semantical 
equivalence. For our examples, it is sufficient that the relation ~ models loop 
range splitting and biased coin splitting, as given by the following clauses: 

while e do c ~ while e A e' do c; while e do c 
X Bern(pi • P 2 ) ~ cci Bern(pi); a ;2 Bern(p 2 ); x Xi A X 2 

Figure 3 provides a selection of structural and program transformation rules. 


CONSEQ 


1= Cl ~ C2 : i?'' ^ ^ 4 >' 


Case 


N Cl ~ C2 : 

1= Cl ~ C2 : A il/' ^ ¥ Cl ~ C2 A ^'ll' ^ $ 


Equiv 


1= Cl ~ C2 : 

1= c'l ~ C 2 : ^ Cl ~ c'l C2 ~ C 2 


1= Cl ~ C2 : ^ ^ 

Fig. 3: Structural and program transformation rules (selection) 


2.2 From pRHL judgments to probability judgments 

We will derive two kinds of program properties from the existence of an appropri¬ 
ate probabilistic coupling. We will first discuss the mathematical theorems, where 
the notation is lighter and the core idea more apparent, and then demonstrate 
how the mathematical version can be expressed in terms of pRHL judgments. 

Total variation and coupling. The first principle bounds the distance between 
two distributions in terms of a probabilistic coupling. We first define the total 
variation distance, also known as statistical distance, on distributions. 








Definition 4. Let X and X' be distributions over a countable set A. The total 
variation (TV) distance between X and X' is defined by 

To bound the distance between two distributions, it is enough to find a 
coupling and bound the probability that the two coupled variables differ. 

Theorem 1 (Total variation, see [7]). Let X and X' be distributions over a 
countable set. Then for any coupling Y = {X,X'), we have 

||v - X'Wtv < + x']. 

This theorem is useful for reasoning about convergence of distributions. 

To describe a pRHL analog of this theorem, we first introduce some useful 
notation. For all memories m and expressions e, we write m{e) for the interpre¬ 
tation of e in memory m. For all expressions e of type T and distribution p over 
memories, let |e]^ be defined as Mlet m = /i in unit m(e); note that |e]^ denotes 
a distribution over T. Similarly, for all events E (modeled as a boolean expression 
encoding a predicate over memories) and distribution p over memories, let 
be defined as Mlet m = p \r\ unit E{m). Thus, {EJi^ is the probability of event E 
holding in the distribution p. Then, Theorem 1 can be written in terms of pRHL. 

Proposition 1. If \= ci ~ C2 : T ^ ^ vi = V2, where exclusively refers 

to variables in ci, then for all initial memories mi and m2 that satisfy the 
precondition, the total variation distance between |'i’i]|c](mi) I'^2][c](7Ti2) 

most ie. || bl][cil(mi) - [^'2lIc2l(m2)llt« ^ h'^l[c](mi) • 

This proposition underlies the “up-to-bad” reasoning in EasyCrypt. 

Stochastic domination and coupling. A second relational property of distributions 
is stochastic domination. 

Definition 5. Let X and X' be distributions over set A with an order relation 
>. We say X stochastically dominates X', written X >sd X', if for all a € A, 

Prx~x[x > a] > Pi'x'^x'lx' > a]. 

Intuitively, stochastic domination defines a partial order on distributions over A 
given an order over A. Strassen’s theorem shows that stochastic dominance is 
intimately related to coupling. 

Theorem 2 (Strassen’s theorem, see Lindvall [7]). Let X and X' be dis¬ 
tributions over a countable ordered set A. Then X >sd X' if and only if there is 
a coupling Y = {X,X') with Y S £>(Ai, V'). 

The forward direction is usually the more useful direction; we can express it 
in the following pRHL form. 

Proposition 2. // 1 = ci ~ C2 : 'F => > V2, then for all initial memories mi 

and m2 that satisfy the precondition, |ui][c](mi) ^^d b2l[c](m2)- 


3 Warming up: Random walks 


We warm up with couplings for random walks. These numeric processes model 
the evolution of a token over a discrete space: at each time step the token will 
choose its next movement randomly. We will show that if the two initial positions 
satisfy some property, the distributions of the two positions converge. 


3.1 The basic random walk 

Our first example is a random walk on the integers. Starting at an initial position, 
at each step we flip a fair coin. If heads, we move one step to the right. Otherwise, 
we move one step to the left. The code for running process k steps is presented 
in the left side of Figure 4. The variable H stores the history of coin flips. While 
this history isn’t needed for computation of the result (it is ghost code), we will 
state invariants in terms of this history. 


pos-«—Start; i-«—0; 

while i < k do 

b^{0,l}; 

H b : : H; 

if b then pos++ else pos-- fi; 
i <- i + 1; 

end 

return pos 


(a) Random walk on Z 


pos <—start; i-<—0; 

while i < k do 

mov 4- {0,1}; 
dir {0,1}; 

crd 4- [l,d]; 

H <— (mov, dir, crd) :: H; 

if mov then 

pos <—pos + (dir ? 1 : -1 ) * u(crd) 

fi; 

i <-i + 1; 

end 

return pos 

(b) Random walk on (Z/fcZ)‘* 


Fig. 4: Two random walks 


We consider two walks that start at locations starti and start 2 that are an even 
distance apart: start 2 — starti = 2n > 0. We want to show that the distribution 
on end positions in the two walks converges as k increases. From Theorem 1, it 
suffices to And a coupling of the two walks, i.e., a way to coordinate their random 
samplings. 

The basic idea is to mirror the two walks. When the first process moves 
towards the second process, we have the second process also move closer; when 
the first process moves away, we have the second process move away too. When 
the two processes meet, we have the two processes make identical moves. 

To carry out this plan, we define T’(h) to be the number of true in H minus 
the number of false; in terms of the random walk, T'(h) measures the net change 
in position of a process with history H. Then, we define a predicate such that 
P(h) holds when H contains a prefix H' such that T'(h’) = n. 




Accordingly, P(Hi) holds when the first process has moved at least n spots 
to the right. Under the coupling, this means that the second process must have 
moved at least n spots to the left since the two particles are mirrored. Since 
the first process starts out exactly 2n to the left of the second process, U(Hi) is 
true exactly when the coupled processes have already met. If the processes start 
out an odd distance apart, then they will never meet under this coupling—the 
coupling preserves the parity of the distance between the two positions. 

To formalize this coupling in pRHL, we aim to couple two copies of the 
program above, which we denote Ci and C 2 . We relate the two while loops with 
rule [While] using the following invariant: 

(posi / P 0 S 2 ^ posi = ii + T'(Hi) a P 0 S 2 = i2 - ^(Hi)) a (P(Hi) ^ pos^ = P 0 S 2 ). 

The loop invariant states that before the two particles meet, their trajectories 
are mirrored, and that once they have met, they coincide forever. 

To prove that this is an invariant, we need to relate the loop bodies. The 
key step is relating the two sampling operations using the rule [Sample]; note 
that we must provide a bijection / from booleans to booleans. We choose the 
bijection based on whether the two coupled walks have met or not. 

More precisely, we perform a case analysis on posj^ = pos 2 with rule [Case]. If 
they are equal then the walks move together, so we use the identity map for /; 
this has the effect of forcing both processes to see the same sample. If the walks 
are at different positions, we use the negation map (^) for /, so as to force the 
two processes to take opposite steps. 

Putting everything together, we can prove the following judgment in pRHL: 

1= Cl ~ C 2 : starti + 2n = start 2 ^ (P(Hi) ^ posi = P 0 S 2 ). 

By Theorem 1, we can bound the TV distance between the final positions. If two 
memories mi,m 2 satisfy mi(start) + 2n = m 2 (start), we have 

II [P°Sll|ci](mi) ~ [P°S2] [C2](m2) II tv ^ 

Note that the right hand side depends only on the first program. In other 
words, proving this quantitative bound on two programs is reduced to proving a 
quantitative property on a single program—this is the power of coupling. 

3.2 Lazy random walk on a torus 

For a more interesting example of a random walk, we can consider a walk on a 
torus. Concretely, the position is now a d-tuple of integers in [0, fc — 1]. The walk 
first flips a fair coin; if heads it stays put, otherwise it moves. If it moves, the 
walk chooses uniformly in [1, d] to choose the coordinate to move, and a second 
fair coin to determine the direction (positive, or negative). The positions are 
cyclic: increasing from k — 1 leads to 0, and decreasing from 0 leads to fc — 1. 

We can simulate this walk with the program in the right side of Figure 4, 
where u(i) is the f-th canonical base vector in (Z/fcZ)'^. As before, we store the 
trace of the random walk in the list H. All arithmetic is done modulo k. 


Like the simple random walk, we start this process at two locations starti and 
start 2 on the torus and run for k iterations. We aim to prove that the distributions 
of the two walks converge as k increases by coupling the two walks, iteration by 
iteration. Each iteration, we first choose the same coordinate crd and the same 
direction dir in both walks. If the two positions coincide in coordinate crd, we 
arrange both walks to select the same movement flag mov, so that the walks either 
move together, or both stay put. If the two positions differ in crd, we arrange the 
walks to select opposite samples in mov so that exactly one walk moves. 

As in the basic random walk, we can view our coupling as letting the first 
process evolve as usual, then coordinating the samples of the second process to 
perform the coupling. In other words, given a history Hi of samples for the first 
process, the behavior of the second coupled process is completely specified. 

Thus, we can define operators to extract the movements of each walk from 
the trace Hi of the samplings of the first process: is the drift of the Ah 

coordinate of the first process, and A' 2 (*,Hi) is the drift of the second process. 
Essentially, these operators encode the coupling by describing how the second 
process moves as a function of the first process’s samples. 

In pRHL, we will use the rule [While] with the following invariant: 

Vi G [l,d]. (ri(i,Hi) - r2(i,Hi) = A[i] => posi[i] = pos2[i]) 

A (posi[i] 7 ^ pos 2 [i] ^ posi[i] = starti[i] + A’i(i,Hi) A pos 2 [i] = start 2 [i] + S 2 {i,Wi)), 

where A is the vector start 2 — starti. The first conjunct states that the walks 
move together in coordinate i once they couple in coordinate i, while the second 
conjunct describes the positions in terms of the history Hi. 

To prove that the invariant is preserved, we encode the coupling described 
above into pRHL, via three uses of the rule [Sample]. The first two samples— 
for crd and dir —are coupled with / being identity bijections (on [l,d] and on 
booleans), ensuring that the processes make identical choices. When sampling 
mov, we inspect the history Hi to see whether the two walks agree in position crd. 
If so, we choose the identity bijection for mov; if not, we choose negation. This 
coupling is sufficient to verify the loop invariant. 

To conclude our proof, the first conjunct in the invariant implies that we can 
prove the pRHL judgment 1 = ci ~ C2 : start2 — starti = A where 

<P = {\/i G [l,d]. A’i(i,Hi) — £’ 2 ( 1 ,Hi) = A[i]) ^ Vi G [l,(i]. posi[i] = pos 2 [i]. 

Finally, Theorem 1 implies that for any two initial memories mi, m 2 with 
m 2 (start) — mi (start) = A, we have 

II [P°Si] [ci](mi)~ [POS 2 ] |c2](m2) II — P* ^ [IP]- T'i(z, Hi) — A'2(i, Hi) 7^ Z\ [ij] j p 

Again, proving a quantitative bound on the convergence of two distributions is 
reduced to proving a quantitative bound on a single program. 

4 Combining coupling with program transformation 

So far, we have seen examples where the coupling is proved directly on the two 
original programs ci and C 2 . Often, it is convenient to introduce a third program 


c* that is equivalent to ci, and then couple c* to C2. Applying transitivity (rule 
[Equiv]), this gives a coupling between ci and C2. Let’s consider two examples. 


4.1 Two biased coins 

Consider a coin flipping process that flips a coin k times, and returns the number 
of heads observed. We consider this process run on two different biased coins: 
The first coin has probability qi of coming up heads, while the second coin has 
probability q2 of coming up heads with qi > q2. Let the distribution on the 
number of heads be /ii and /r2 respectively. 

Intuitively, it is clear that the first process is somehow bigger than the second 
process: it is more likely to see more heads, since the first coin is biased with 
a higher probability. Stochastic dominance turns out to be the proper way to 
formalize our intuition. To prove it. Proposition 2 implies that we just need to 
find an appropriate coupling of the two processes. 

While it is possible to define a coupling directly by carefully coordinating 
the corresponding coin flips, we will give a simpler coupling that proceeds in 
two stages. First, we will couple a program ci computing to an intermediate 
program c*. Then, we will show that c* is equivalent to a program C2 computing 
/i2, thus exhibiting a coupling between fii and /i2. Letting r = 52/91 and denoting 
the coin flip distribution with probability p of sampling true by Bern(p), we give 
the programs in Figure 5 . 

For the first step, we want to couple Ci and c*. For a rough sketch, we want to 
use rule [While] with an appropriate loop invariant; here, ni > n*. To show that 
the invariant is preserved, we need to relate the loop bodies. We use the two-sided 
rule [Sample] when sampling x and y (taking the bijection / to be the identity), 
the one-sided rule [Sample-L] to relate sampling nothing (skip) in ci with 
sampling z in c*, and the one-sided rule [IfL] to relate the two conditionals. (The 
one-sided rule is needed, since the two conditionals may take different branches.) 
Thus, we can prove the judgment \= ci ^ c* : qi > q2 /\ r = q2/qi ni > n*. 

For the second step, we need to prove that c* is equivalent to C2. Here, we use a 
sound approximation ~ to semantic equivalence as described in the preliminaries. 
Specifically, we have x #- Bern(5i • r) ~ y Bern(5i); r Bern(r); x ^ y A z 
for the loop bodies; showing equivalence of c* and C2 is then straightforward. 
Thus, we can show 1 = c* ^ C2 : 51 > 92 A r = 52/91 ^ n* = n2- Applying rule 
[Equiv] gives the final judgment 1 = ci ~ C2 : 51 > 92 A r = 52/51 ^ ni > n2, 
showing stochastic domination by Proposition 2 . 


4.2 Balls into bins: asynchronous coupling 

The examples we have seen so far are all synchronous couplings: they relate the 
iterations of the while loop in lock-step. For some applications, we may want to 
reason asynchronously, perhaps allowing one side to progress while holding the 
other side fixed. One example of an asynchronous coupling is analyzing the balls 
into bins process. We have two bins, and a set of n balls. At each step, we throw 


n •«— 0; i •«— 0; 
whil i < k do: 

X ^ Bern(qi); 


if X then 

n •«— n + 1; 

fi 

i ■«- i + 1; 

end 

return n 

(a) Program ci 


n •«— 0; i •<— 0; 

while i < k do; 

y <*- Bern(qi); 

z <^Bern(r); 

X •«— y A z; 
if X then 
n •«— n + 1; 

fi; 

i + 1; 

end 

return n 

(b) Program c* 


n •<— 0; i •«— 0; 

while i < k do: 

X ^ Bern(q2 ); 


if X then 

n •«— n + 1 

fi; 

i ^ i + 1 

end 

return n 

(c) Program C2 


Fig. 5 : Coupling for biased coin flips 


a ball into a random bin, returning the count of both bins when we have thrown 
all the balls. The code is on the left side in Figure 6 . 


i, binA, binB •«— 0; 
while i < n do 

i ■«- i + 1; 

if b then binA++ else binB++ fi 
end 


return (binA, binB) 

(a) Original programs ci,C2 


i, binA, binB •<— 0; 
while i<nAi<mdo 

b 4-{0,i}: 

if b then binA++ else binB++ fi; 
i -f- i + 1; 

end 

while i < n do 

b 4-{0,i}: 

if b then binA++ else binB++ fi; 
i -i- i + 1; 

end 

return (binA, binB) 

(b) Intermediate program c* 


Fig. 6 : Coupling balls into bins 


Now, we would like to consider what happens when we run two processes with 
different numbers of balls. Intuitively, it is clear that if the first process throws 
more balls than the second process, it should result in a higher load in the bins; 
we aim to prove that the first process stochastically dominates the second with 
the following coupling. Assume that the first process has more balls (rii > 712). 
For the first n2 balls, we have the two process do the same thing—they choose 
the same bucket for their tosses. For the last ni — 712 steps, the first process 
throws the rest of the balls. Evidently, this coupling forces the bins in the first 
run to have higher load than the bins in the second run. 

To formalize this example, we again introduce a program c*, proving equiv¬ 
alence with Cl and showing a coupling with C2. The code for c* is on the right 
side in Figure 6 ; we require the dummy input m to be equal to 02. 





Proving equivalence with program ci is direct, using the loop range splitting 
transformation in EasyCrypt: while e do c ~ while e A e' do c; while e do c. Once 
this is done, we simply need to provide a coupling between c* and C2. By our 
choice of m, we can trivially couple the first loop in c* to the (single) loop in C 2 , 
ensuring that <P = binA* > binA 2 A binB* > binB 2 after the first loop. 

Then, we can apply the one-sided rules to couple the second loop in c* with 
a skip statement in C2. It is straightforward to show that ^ is an invariant in 
rule [WhileL], from which we can conclude 1 = c* ~ C2 ni > n2 f\ m = n2 => 
binA* > binA 2 A binB* > binB 2 , and by equivalence of ci and c* we have N Ci ~ 
C 2 : ni > 712 => binAi > binA 2 A binBi > binB 2 , enough for stochastic domination by 
Proposition 2 . 


5 Non-deterministic couplings: birth and death 

So far, we have seen deterministic couplings, which reuse randomness from the 
coupled processes in the coupling; this can be seen in the [Sample] rule, when 
we always choose a deterministic bijection. In this section, we will see a more 
sophisticated coupling that injects new randomness. 

For our example, we consider a classic Markov process. Roughly speaking, a 
Markov process moves within a set of states each transition depending only on 
the current state and a fresh random sample. The random walks we saw before 
are classic examples of Markov processes. 

A more complex Markov process is the birth and death chain. The state space 
is Z, and the process starts at some integer x. At every time step, if the process 
is at state i, the process has some probability bi of increasing by one, and some 
probability of decreasing by one. Note that and bi may add up to less than 
1 : there can be some positive probability 1 — ai — bi where the process stays fixed. 

To model this process, we define a sum type Move with three elements (Left, 
Right and Still) which correspond to the possible moves a process can make. 
Then, the chains are modeled by the code in the left of Figure 7 , where the 
distribution bd{state) is the distribution of moves from state. 

Just like the biased coin and balls into bins processes, we want to prove 
stochastic domination for two processes started at states starti > start 2 via 
coupling. The difficulty is that if the processes become adjacent and they both 
move, the two processes may swap positions, losing stochastic domination. 

The solution is to use a special coupling when the two processes are on 
two adjacent states as in Mufa [8]. Unlike the previous examples, the coupling 
is not deterministic: the behavior of one process is not fully determined by 
the randomness of the other. Our loop invariant is the usual one for stochastic 
domination: statei > state 2 . To show that this invariant is preserved, we perform a 
case analysis on whether statei = state 2 , statei = state 2 -|- 1 or statei > state 2 + 1. 

We focus on the interesting middle case, when the states are adjacent. Here, we 
perform a trick: we switch ci,C2 for two equivalent intermediate programs Ci,C2, 
and prove a coupling on the two intermediate programs. The two intermediate 
programs each sample from dcouple, a distribution on pairs of moves, and project 


while i < k do 

dir 4-bd(state); 

if dir = Left then 
state •<—state - 1; 
else if dir = Right then 
state state + 1; 
fi 

H ■<— state :: H; 
i ■«- i + 1; 

end 

return state 

(a) Original programs ci,C2 


while i < steps do 

d ^dcouple; 
dir ■<— proj [1|2] d; 
if dir = Left then 
state •«— state - 1; 
else if dir = Right then 
state •«— state + 1; 
fi 

H ■<— state : : H; 
i -i- i + 1; 

end 

return state 

(b) Intermediate programs Ci,C2 


Fig. 7 : Coupling the birth and death chain 


out the first or second component as dir; in other words, we explicitly code c\, C 2 
as sampling from the two marginals of a common distribution dcouple. By proving 
that the marginals are indeed distributed as bd(statei) and bd(state 2 ), we can 
prove equivalences ci ~ and C2 — C2. The code is in the right side of Figure 7 , 
where proj [l|2] is the first and second projections in ci and C 2 , respectively. 

All that remains is to prove a coupling between c* and C2 satisfying the loop 
invariant statei > state 2 . With adjacent states, dcouple is given by the following 
function from pairs of moves to probabilities: 

op distr-adjacent a^ + i bi bi + i (x : Move * Move) = 


if 

X = (Right, 

Left ) 

then 

inin(6i + i, ai) 

else 

if 

X = (Still, 

Left ) 

then 

+ 

1 

+ 

else 

if 

X = (Right, 

Still) 

then 

(Oi - bi+i) + 

else 

if 

X = (Still, 

Right) 

then 

O-i + l 

else 

if 

X = (Left , 

Still) 

then 

bi 

else 

if 

X = (Still, 

Still) 

then 




L - min(6i + i 

, CLi) - 

Oi + i - bi - \bi+i 

— ai 1 else 

if 

X = (- 

) 

then 

0. 



where x~^ denotes the positive part of x: simply a: if a: > 0 , and 0 otherwise. Note 
that the case {Left, Right) has probability 0 : this forbids the first process from 
skipping past the second process. 

Now the coupling is easy: we simply require both samples from dcouple to be 
the same. Since statei = state2 + 1 and the distribution never returns (Left, Right), 
the loop invariant is trivially preserved. This shows the desired coupling, and 
stochastic domination by Proposition 2 . 


6 Conclusion and future work 

We have established the connection between relational verification of probabilistic 
programs using pRHL, and probabilistic couplings. Furthermore, we have used 
the connection by using pRHL to verify several well-known examples of couplings 
from the literature on randomized algorithms. More broadly, our work is a 
blend between the two main approaches to relational verification: (i) reasoning 


about a single program combining the two programs (e.g. cross-products [12], 
self-composition [3], and product programs [2]); and (ii) using a program logic 
to reason directly about two programs (e.g. relational Hoare logic [5], relational 
separation logic [11], and pRHL [1]). We have only scratched the surface in 
verifying couplings; we see three natural directions for future work. 

A more general verification framework. When we construct a coupling, the core 
data is encoded by the bijection / for the rule [Sample], which specifies how the 
two samples are to be coupled. A careful look at the rule reveals that the coupling 
is a deterministic coupling, as defined by Villani [10]. While such couplings are 
already quite powerful, there are many examples of couplings that cannot be 
verified using deterministic couplings. We have worked around this difficulty 
by using program transformation rules, but an alternative approach could be 
interesting: allow more general binary relations when relating samples, rather 
than just bijections. This generalization could enable a more general class of 
couplings and yield cleaner proofs. 

Moreover, it would be interesting to extend EasyCrypt with mechanisms for 
handling the non-relational reasoning in couplings. To prove quantitative bounds 
on total variation in the random walk example, we need to bound the time it 
takes for a single random walk to reach a certain position. Proving such bounds 
requires more complex, non-relational reasoning. We are currently developing a 
program logic for this purpose, but it has not yet been integrated into EasyCrypt. 

Extending to shift and path coupling. The couplings realized in the random walks 
are instances of exact couplings, where we reason about synchronized samples: we 
relate the first samples, the second samples, etc. A more general notion of coupling 
is shift-coupling, where we are allowed to first shift one process by a random 
number of samples, then couple. The general theory of path couplings provides 
similar-shaped inequalities as the ones in exact coupling, allowing powerful 
mathematical-based reasoning inside the logic with the [Conseq] rule. These 
coupling notions are complex, and it is not yet clear how they can be verified. 

Other examples. There are many other examples of couplings, in particular 
the proof of the constructive Lovasz Local Lemma, a fundamental tool used in 
the probabilistic method, a powerful proof technique for showing existence in 
combinatorics. 
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